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ABSTRACT 

I. 

Suppose  that  measurements  =  (x.^,  -  i  =  l,...,k,  can  be 

taken  on  a  unit  sequentially  in  that  order  at  the  prescribed  costs  C. , 

•V  ' 

i  =  l,...,k.  The  unit  comes  from  one  of  the  two  populations  HI  and  H^, 
and  it  is  desired  to  select  a  population  (from  these  two)  from  which  the 

<_,  l  '.r  ' 

unit  is  supposed  to  belong  to,  on  the  basis  of  the  measurements  x^,  x^, 

...  .  Given  the  loss  incurred  by  selecting  population  Hi  when  in  fact  it 

>  -  j  :  * 

belongs  to  H. ,  the  prior  probability  p,  of  H.  (i  =  1,2),  and  assuming  that 
H.  has  the  normal  distribution  N(t£,V),  i  =  1,2,  we  derive  the  sequential 

1  1  r>-  .  r  L  ,, 

Bayesian  solution  of  the  discrimination  problem  when  ,  Vg  and  V  are 

r  i 

known.  When  ,  V  are  unknown  and  must  be  estimated,  we  propose  a  solution 
which  is  asymptotic  Bayesian  with  exponential  convergence  rate. 
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1 .  FORMULATION  OF  THE  PROBLEM 

Let  H-j,  H£  be  two  populations.  We  shall  draw  an  individual  a.  randomly 
from  one  of  them.  The  problem  is  to  select  a  population  from  which  a  is 
most  likely  to  come.  The  selection  is  based  upon  some  measurements  of 
variates  (physical,  chemical,  biological,  etc.)  taken  on  the  individual  a, 
and  the  decision  is  reached  sequentially  in  the  following  manner.  First, 
the  variates  are  divided  into  k  groups  with  a  definite  preference  order. 

At  the  start  we  can  make  a  decision  or  take  measurements  of  the  first 
group.  We  may  choose  to  stop  here  and  make  a  decision  based  on  ,  or  we 
can  go  further  and  proceed  to  take  measurements  x^  belonging  to  the  second 
group.  In  general,  after  making  observations  on  the  first  i  groups  and 
recording  the  results  x^ ,  ....  x^ ,  we  may  decide  to  terminate  observation 
and  make  a  decision  (  a  belongs  to  or  H^),  or  we  can  go  a  step  further 
and  proceed  to  observe  the  (i+l)~th  group.  Since  there  are  only  k  groups 
of  measurements,  a  final  decision  must  be  made  after  k  stages  of  observa¬ 
tion.  We  suppose  that  the  cost  of  observing  the  i-th  group  is  a  constant 
Cj ,  i  =  l,...,k.  These  constants  do  not  depend  upon  the  results  x-p  ...,  x^ 
of  observations  on  these  k  groups  of  measurements. 

The  motivation  behind  such  a  scheme  is  obvious:  Usually  we  have  some 
prior  knowledge  concerning  the  importance  of  various  variates  in  the  dis¬ 
crimination  of  an  individual.  The  gain  of  reliability  in  discrimination 
through  observing  more  variates  must  be  weighted  with  the  cost  we  pay  in 

obtaining  the  measurements  of  these  variates  (see  Wald  (1947,  1950)). 

•  *  » 

Denote  =  (Xp...,X^)  ,  i  =  l,...,k.  Assume  that  under  Hj ,  the 
distribution  of  is  normal  N(uj^,  ^(ip  where 
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UJ(0 


:  ,  j  =  1,2;  V, 


11 

V12 

Vli 

21 

V22 

*  •  • 

V2i 

il 

Vi2 

.  .  . 

Vii 

Denote 


U(i)  =  ^Vi+1,r  Vi+1,2 . Vi+1,i^’  1  "  1,2 . k'] 


Wi  =  Vii  -  U(i-1)V(i-1)U(i-l)’ 


i  -  2,3,. . .  ,k;  W] 


tji^X(i)^  =  Mj,i+1  +  U(i ) V( i ) ^X( i )  "  (i ) ^ *  1  1  ••  •  •  »k'1  i  J  “  1 »2- 

If  a  e  H^,  the  loss  incurred  by  discriminating  a  into  is  t  ,  r,s  =  1,2. 
We  shall  assume  that  <  ^l’  A11  <  A12‘  Prior  Probabilities  of 

and  H2  are  p1 ,  p^ ,  0  <  p1  <  1 ,  +  p2  =  1 ,  respectively. 

The  problem  is  to  find  out  the  Bayes  discrimination  under  the  circum¬ 
stances  described  above. 


2.  THE  FORM  OF  BAYESIAN  SOLUTION 

In  the  sequel  we  use  f(*,v,E)  to  denote  the  density  function  of  N(v,r). 
As  is  well  known,  if  has  been  observed,  Bayesian  discrimination 
rule  should  be 


’  V(k))  <  pl^ll  "  £12^ 


- uu - t±tJ. - _ L! _ it —  acceDt  H 

f^x(k) ’  yl (k) *  V(k) ^  —  P2(^22  '  *21 ^  *  1 


W  „  P1U1T£12)  accept  H  . 

f^x(k)’  yl(k)’  V(k)^  P2(£22~*21^ 


The  rule  can  be  written  as:  When 
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(t2,k-l(x(k-V))  '  tl,k-l(x(k-l)))  W"klxk 

-f[t2,k-l(x(k-l))Wklt2,k-l(x(k-l))  '  tl1k-l(x(k-l))Wkltl,k-l(x(k-l)):i 

■j  I  _  1 

+  2  [(x(k-l)  -  w2(k-l)>  V(k-l)(x(k-l)  ~u2(k-l)) 

'  (x( k-1 )  ‘  y  1  ( k-1 ) ^  V(k-l)(x(k-l)-,1l(k-l))] 

+  log[p1(i11  -  a12)/p2(422-  a21)].  (2) 

We  accept  H-j ,  otherwise  we  accept 
Denote 

Di  ’  WJl(t2i<x(i))-t1i<x(i)))  *  “1,1*1  *U(i)V(i)<“l(i)  '"2(1))’ 

+  l[(x(i)-"2(i)>  Vi,(x(ir»2(i))-(x(i)-"i(i)>'v(i)(x(()-“,(i))] 

+  log[p^  (£-| i  -  A]2)^P2^22  “  ^21  ^ • 

Noticing  that  under  the  conditional  distribution  of  is 

^(^-j  ,k-l  ^x(k-l  ’  4)’  we  see  Probability  of  fulfilling  the  inequality 

(2)  is  m j  under  H  L ,  where 

mj,i  •  -(v»iy-(i)))/«S.. 

Therefore,  if  we  have  already  observed  =  x^k then  under  this  con¬ 

dition,  the  continuation  of  observing  followed  by  a  decision  according  to 
the  rule  (1)  gives  a  conditional  risk 

L3  =  4, k-1  =  A^Ullml,k-lplf(x(k-l)’  yl (k-1 ) ’  V(k-1)) 

+  *21m2,k-lp2f (x(k-l) •  w2( k-1 ) ’  V(k-1)} 

+  S'12(1-ml,k-l)plf<x(k-l)*  yl(k-l)’  V(k-1)} 

+  jL22«1-m2,k-l>p2f(x(k-l)*  w2(  k-1 )  ’  V(k-1)>} 

+  ci  +  4  +  ...  +ck. 


(3) 
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On  the  other  hand,  if  we  make  a  decision  without  observing  X^,  then  the 


posterior  risk  is 


L1  “  Ll,k-1  =  Ak'_1  {plf(x(k-l)’  ( k-k)  ’  V(k-1)^ 


+  P2f(x(k-1)’  p2(k-l ) ’  V(k-l)^21} 


+  c1  +  c2  +  ...  +  ck_1 


when  we  classify  the  individual  a.  into  , 

L2  =  L2  ,k-l  =  A^‘{plf^x(k-1)’  u  1  ( k- 1 )  *  V(k-l)^21 

+  P£f  ( x ( k- 1 )  ’  u2(k-l )  ’  V(k-1)^22} 

+  C1  +  C2  +  ...  +  Ck-1 

when  we  classify  a  into  H2<  In  (3)-(5),  the  definition  of  Ak_-j  is 
Ai  =  plf(x(i)’  y 1 ( i )  ’  V(i))  +  p2f(x(i)’  y2(i)’  V(i))- 


Denote  by  the  minimum  value  of  L-j ,  and  L^.  If  ip  =  1  or  2,  we 
classify  the  individual  a  into  H-j  or  H^,  respectively.  Otherwise,  we 
go  on  observing  Xk,  and  make  the  final  decision  according  to  (1). 

Let  Gk_i  (x(k_i ) )  =  min(L.|,  L^,  L^).  It  is  the  minimum  posterior  risk 
we  can  get  based  on  having  observed  X^k_^  (stop  here  or  continue  to  observe), 
In  general,  for  any  i,  we  define  as  the  minimum  posterior  risk  we 

can  get  based  on  having  observed  x^j  (stop  here  or  continue  to  observe). 

In  the  following  we  define  G^(x^j)  by  induction.  Suppose  that  we  have 
already  defined  G^x^.j),  i  =  k-1  ,k-2,. . .  ,k-£,  and  X^k_£_1j  =  x^k_£_^  has 
been  observed.  If  we  stop  observing  and  classify  a  into  or  Hg.  then 
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the  posterior  risk  is 

L1  =  Ll,k-£-l  =  a { pi f t x ( k- a-  1 ) •  p  1  ( k-A-1 ) ’  V(k-£-l)*11 

+  p2f^x(k-i-l)’  p2( k-£- 1 ) ’  V(k-£-l))fc21} 

+  C1  +  C2  +  +  Ck-M 

or 

L2  =  L2 ,k-A-l  =  Ak_1J._1  { pi f ^ x ( k-Jt- 1 ) '  yl  (k-£-l ) ’  V(k-£-l)^12 

+  p2f(x(k-£-l)’  p2(k-£-l )  ’  V(k-n-l)^22} 

+  c1  +  c2  +  ...  +  ck_Jl_1, 

respectively.  If  we  go  on  observing  then  the  minimum  risk  we  can 

get  is  Gk_jLCx(|<_g._i ) »  xk_*)‘  according  to  the  definition  of  )  • 

Hence  in  this  case  the  minimum  posterior  risk  is 

4  =  L3,k-£-l  =  Ak_1£_1'{plf(x(k-£-l)’  U1  (k-8,-1 )  ’  V(k-£-l)) 

El(Gk-£^x(k-£-l)  *  Xk-£^x(k-£-l))  + 

\ 

P2f ^x(k-° -1 ) ’  u2(k-£-l ) ’  V(k-£-l ) ^ 

E2  (Gk-£^x(k-£-l )  ’  Xk-£^x(k-£-l))}- 

Summing  up,  we  get 

Gk-£-l^x(k-£-l)^  =  min(Lr  L2»  L3^‘ 

In  this  way  we  complete  the  induction  process  of  defining  G^(x^.j), 
i  =  l,...,k-l.  Finally,  we  define 

Gq  =  min(L10,  L20,  L3Q) 

with  L1Q  =  p^j  +  P2a21  ’  L20  =  Pl*12  +  p2£22 *  L30  =  E(VX(i)^ 

Based  upon  the  quantities  just  defined,  we  now  introduce  the  following 
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discrimination  rule: 

i°.  First,  determine  i  such  that  =  Gg.  If  i  =  1  or  2,  then 

we  do  not  make  any  observation  and  classify  the  individual  into  H-j  or  . 

respectively.  Otherwise,  proceed  to  2°. 

2°.  Determine  the  following  three  sets: 


A11  = 

{X1 

:  Ln 

-  L21  * 

hi 

-  L31 

A21 

{X1 

:  L11 

>  l2]. 

L31 

i  L21 

A31 

{X1 

••  hi 

*  L3V 

L21 

>  L31 

and 

observe  X1  = 

x-j .  If 

x1  e 

Ail 

for  j  = 

1,2 

,  then 

and 

classify  the 

individual  i 

nto  H 

1  or  H2. 

,  respecti 

to  3°. 


3°.  In  general,  if  we  have  not  made  a  final  decision  after  observing 
x^,  then  determine  the  following  three  sets: 


1  ,i+l 

"  {xi+l: 

Ll,i+1 

-  L2,i+1’  L1 ,i+l 

-  4,  i+1 

'2,1+1 

{xi+l: 

Ll,i+1 

*  L2 , i +1  *  L3,i+1 

-  L2, i+1 

3  ,i+l 

=  lXi+l: 

Ll,i+1 

>  L3,i+1 ’  L2 , i+1 

>  L3,i+1 

and  observe  =  xi+^ .  If  x^+i  e  Aj  ^  for  j  =  1,2,  then  we  stop  obser¬ 
vation  and  classify  the  individual  into  H-j  or  H^,  respectively.  Otherwise, 
we  return  to  the  beginning  of  3°  with  i  changed  to  i  +  1. 


3.  PROOF  OF  BAYESIAN  PROPERTY  OF  THE  RULE 

Any  sequential  discrimination  rule  can  be  expressed  in  the  form  ( T , 5 ) , 

where  T  is  "stopping  time",  i.e.,  T  takes  0,  1 ,  2,  . . . ,  k  as  its  value. 

Either  T  =  0  and  then  6  =  H,  or  6  =  H?,  or  T  does  not  take  the  value  0.  In 

di 

this  case  for  any  i  >_  1 ,  the  set  {x^  =  T(x^)  1  ^  has  the  form  AixR  » 
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where  A.  is  a  Borel  set  in  and  d..  is  the  sum  of  dimensions  of  x^-j,  ■ 
x^,  <s(x^)  assumes  the  "values"  or  and  (x^:  c${x^)  =  H-j  >  is  a 
Borel  set  in  space  x^.  The  Bayes  risk  of  such  a  rule  (T,6)  is 
B(T,6)  =  P1E1tr6(x^p  +  P2E2K2,,s(X(tj)- 

Denoting  by  (T*,s*)  the  discrimination  rule  given  in  Section  2,  we 
have  the  following  theorem: 


THEOREM  1.  For  any  (T,6),  we  have 


B(T,<5)  >  B(T*,6*).  (7) 

Proof.  Obviously,  B(T,s)  >_  B(T*,5*)  for  any  (T,s)  when  T*  =  0.  In  the 
following  we  assume  that  k  >_  1.  It  is  trivial  to  verify  that  the  conclusion 
of  the  theorem  is  true  when  k  =  1.  For  the  general  case,  use  the  method  of 
induction.  Suppose  that  the  conclusion  of  Theorem  1  is  true  when  k  is  re¬ 
placed  by  k  -  1 .  We  have  only  to  show  that  for  any  x^ ,  the  conditional  risk 
(denoted  by  R(T,s|x^))  of  discrimination  (T ,i)  under  the  condition  that 
X.j  =  x-j  is  observed,  is  always  greater  than  or  equal  to  the  conditional  risk 
R(T*,6*|x-|)  of  discrimination  (T*,6*).  Three  cases  are  in  order: 

1°.  According  to  (T,-5),  we  should  go  on  observing  X^. 

Since  (after  having  ovserved  X^)  there  are  at  most  k  -  1  groups  of 
measurements  that  may  be  observed,  according  to  the  induction  assumption 
that  the  theorem  holds  for  k  -  1  groups  of  observations,  if  we  continue  to 
take  observations  according  to  the  rule  of  (T*,6*)  after  having  gotten 
X1  =  Xj,  then  the  Bayes  risk  (which  is  L31  under  the  previous  notations)  we 
get  would  not  be  greater  than  R(T,5]x-j).  But  if  we  use  the  rule  (T*,6*), 
then,  after  having  observed  =  x1 ,  the  minimum  posterior  risk  we  can  get 
is  G ^(x^j)  =  mindjj,  ,  L^).  Therefore 
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R(T*,6*|x1)  <.R(T,6|x1).  (8) 

2°.  According  to  (T,5),  after  having  observes  X-j  =  we  classify  a  in¬ 
to  H.| . 

Now  R(T,6  |x-| )  =  Lir  But  according  to  (T*,S*),  we  have 

R(T*,6*|x-|)  =  G]  ( x  ( i ) )  1  b-j  i . 

So  (8)  is  still  true. 

3°.  According  to  (T,<5),  after  having  observed  =  xi »  we  classify 
a  into  H^. 

This  case  is  similar  to  2\ 

Therefore,  we  have  shown  that  (8)  is  always  true,  and  the  theorem  is 
proved. 


4.  DETAILED  COMPUTATION  PROCEDURE  FOR  THE  CASE  OF  k  =  2 

When  k  <  2,  there  are  no  computation  difficulties  in  the  application  of 
the  method.  When  k  >  2,  L3i  with  i  <_  k  -  2  is  not  easy  to  compute,  and  the 
application  of  the  method  is  quite  involved. 

A  very  important  case  in  practice  is  k  =  2.  For  the  case,  we  detail 
the  compution  procedure  as  follows: 

1°.  Compute  W2  =  V^2  '  ^21 V1 1 ^1 2  * 

2'.  Denote  by  the  observation  of  the  first  group.  Calculate 

t  j  (x.| )  -  pj2  +  ^21^1 1  (xi  "  u  j  1 )  ’  J  =  ^ ^  • 

3°.  Compute 

D  =  W21fu22-  u12  +  V21Vl](yll  "  P21^  ’ 
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9  -  2  {to(x-]  1 2 ( x i )  -  ti  (x-j  )W£  1 1  ( x 1 5 

+  (xi  '  ^21 )  ^11  ^X1  ”  y21  ^  ”  ^*l”^lP  PPxl_lJlP^ 

+  log[p1  (^12  -  £n)/p2U21  -  a22)]. 

4°.  Compute  m.  =  $((q  -  Q't^x-j  ))//D'W2D  ) ,  j  =  1,2. 

5  .  Compute  a  ^11^  P2  ^  ^  ^  *  ^21*  ^11^* 

L-j  "  A  (p-|t(x^5  u  -j  1  >  ^11^11  +  P2^(xi »  ^21*  )^21  ^ 

“  A  (Pif(xr  ^11*  ^11^12  +  P2^^x] »  ^21*  ^11  ^22^  + 

-  A  { £-j ifn-j p-jf (x-j ,  +  ^21  m2^2^ ^ X 1  ’  ,J21  ’  P 

+  £•  1 2  ( 1  -  m^i )  p  1  f  ( x-j »  U'ji>  ^lP  ^  ^22^  —  *  U21  ’  ^lp^ 


+  C-j  +  c2. 

6°.  Find  out  the  smallest  iQ  such  that  Li  =  min(L-j,  L2,  L3). 

If  1q  =  1  or  2,  then  we  classify  the  individual  into  or  H,,.  If  iQ  =  3, 
then  we  go  on  observing  X^. 

7°.  Compute  D'x^  If  D'x2  <_  q  (D  and  q  have  been  computed  in  3°),  we 
classify  a  into  H-j.  Otherwise,  we  classify  a  into  H2> 


5.  THE  CASE  WHEN  PARAMETERS  ARE  UNKNOWN 

In  the  discussion  above,  we  have  assumed  that  p-| ,  p2>  ,  u2  and  V  are 

all  known.  In  practice,  such  parameters  are  usually  unknown  or  partially 
unknown.  In  such  cases  we  must  assume  that  some  training  samples  are 
available  to  make  some  estimation  on  the  unknown  parameters,  which  will  be 

/'.AAA 

denoted  by  p^,  p2n>  p1f|,  y2n  and  V  .  Then  we  use  these  estimates  to  replace 
Pi  >  P2>  P]»  i>2  anc*  V  in  the  above-defined  algorithm.  In  this  way  we  get  a 
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rule  of  discrimination  which  will  be  denoted  by  {Tn,<5n),  whose  Bayesian 
risk  is 

B<Vsn>  '  E(B'Tn<y(n))-  VY(„)>)lY(„)). 

where  8(Tn(V(n),  6n(V(n)  ))  is  to  be  understood  as  the  Bayesian  risk  of  the 
discrimination  rule  obtained  by  the  above  scheme,  on  condition  that  the 
training  sample  is  fixed  as  Y^.  Since  for  any  it  is  true  that 

B(Tn<Y(n)>'  s„<Y(n)>)  i 
we  shall  always  have 

B(Tn.sn>  >  B(T*,S*). 

Now  we  proceed  to  prove  the  following  theorem. 

THEOREM  2.  If  p.^,  p2n,  y.^,  and  vn  are  constant  estimates  of 

p, ,  p0,  u, ,  u0  and  V,  respecti vely,  then  lim  B(T  ,6  )  =  B(T*,6*). 

I  c  i  l.  n-*°°  •  »  n 

The  proof  of  the  theorem  is  based  on  the  following  lemma. 

LEMMA  1.  Denote  by  (T  ,5  )  the  discrimination  rule  obtained  by  substi¬ 
tuting  qln,  q2r),  vln,  v2n  and  zn  for  ,  p2>  ,  u2  and  V  in  the  definition 
of  (T*,6*)  in  Section  2.  Then  we  have 

B(Tn,Sn)  -  B(T,6)  (9) 

if 

qln  P1  ’  q2n  -  p2’  vln"yl’  v2n  -  ^2  and  Zn  "  V‘  (10) 

Proof.  We  shall  use  Gg(n),  G^(x^j.n),  Lj.(n)  to  denote  the  quantities 
corresponding  to  GQ,  G- (x(i))»  Lji  in  defining  (Tn ,5p)  by  replacing  p  ,  etc., 
by  qln,  etc. 

Since  it  is  obvious  that 
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B(T*,5*)  =  EGQJ 

B(tn.in)  =  EGo(n)- 

Therefore,  on  noticing  the  uniform  boundedness  of  Gq  and  Gq(o)  (not  exceed¬ 
ing  max(£..)/»  we  see  that  in  order  to  prove  the  lemma  we  need  only  to  prove 

lim  LJ0(o)  =  LJ0,  j  -  1,2,3.  (II) 

Since  L10(n)  •  q,„t„  ♦  q2„«21,  L2„(n)  -  q,n<,2  *  q2/22  and  q|n  *  p, 
and  q,,n  ->■  p2,  we  see  that  (11)  is  true  for  j  =  1,2. 

In  order  to  prove  (11)  for  j  =  3,  we  use  induction.  First  suppose  that 
k  =  1.  According  to  the  definition,  we  have 

L^q  -  +  P2m292]  ^  PjO  -  n^i )  g  +  P2(l  -m2)&22*  (^2) 

L30^n^  =  qlnmlnilll  +  q2nm2n£21  +  qln^  ‘mln^l2  +  q2n^  'm2n^22’ 


where  m^  =  P(s  1  0|u-j,V),  m?  =  P(c  <  0|u2>V), 

"in  =  P^n  -  °lvln,Sn^  m2n  =  P^n  -  °lv2n,Zr^* 


£  =  x 


1  i  1  1  1  Pi  ^  pn  " 

1V  (y2  '  yl)  +  Zv2V  u2  '  7U1V  u~\  ~  109  p^>U22  -  £21T  ’ 


•  -1,  ,  A  1  '  -1  1  '  -1  ,  qln^ll  ~  £12^ 

and  ’n  ~  xlZn  ^v2n"vln)  2  v2nzn  J2n  '  2  ’in  'n  vln  '  log  '  £21^ 


It  is  clear  that  when  (10)  is  true,  the  distribution  of  cn  under  (v^n,Ep)  con¬ 
verges  to  the  distribution  of  £  under  (u^,V),  i  =  1,2,  which  entails 

m,  -*■  m, ,  m-  m0  when  n  -*•  ». 

In  1  c 

According  to  (12)  and  (13),  we  have  t-30(n)  L30  and  the  case  k  =  1  is  proved 

Now  we  assume  that  the  conclusion  of  the  lemma  is  true  for  k  -  1. 
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Express  L^q  and  L^q( n )  as 

L30  =  E  L?1 ,  L-j-jU-j)), 

1— 3  0  ( n )  —  ^  min^L^i(n),  L21  (n) ,  L31  (ri»Xi  . 

Based  on  the  expressions  of  L^,  given  in  Section  2,  we  get 

L j  1  ( n )  -  Ljr  j  =  1,2.  (14) 

Also,  considering  the  expressions  of  L^-jU^)  and  L3^(n,X^),  in  order  to  prove 
that  (14)  is  true  for  j  =  3,  we  need  only  show  that  when  (10)  is  true, 

e(g2  ( X  ^  2)  »n)  I  X-j )  e(g2(X{2))|X1)  (15) 


for  fixed  X-j.  For  this  purpose,  we  note  that  to  calculate  the  values  of  both 
sides  of  (15),  on  condition  that  is  observed,  it  is  the  same  as  calculating 
and  EGj(X^jj)  in  the  original  problem  with  k  reduced  to  k  -  1 . 
Therefore  the  truth  of  (15)  for  any  fixed  X^  follows  directly  from  the  induction 
hypothesis.  From  this,  and  the  fact  that  G2(X^,n)  is  uniformly  bounded,  it 
follows  by  the  dominated  convergence  theorem  that  L3Q(n)  -♦  l_30  for  k.  Thus 
we  prove  ( 1 1 )  and  hence  the  lemma. 

Now  back  to  the  proof  of  the  theorem.  By  Lemma  1,  for  any  e  >  0,  we  can 
take  n  >  0  small  enough  such  that 


n. 


lujn"V 


j  -  1,2, 


<  n. 


(16) 


imply 

lB(VV(n)>-  sn(V(n)>)  *  B<T*-S*H  <  e- 

By  consistency  we  know  that  when  n  is  large  enough,  the  probability  that  the 
inequalities  in  (16)  are  true  simultaneously  is  not  less  than  1  -  e.  Also, 
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noticing  that  B(Tn(Y(n))’  ^n( Y(n) ))  <.  M  =  maxU^,  s i]2,  «21 ,  *22),  we  get 

|B(Tn,«n)  -  B(T*,«*)|  <  e  +  M£ 

for  n  large  enough.  This  concludes  the  proof  of  the  theorem. 

Usually  =  (Yir...,Yln^,  y2i  * ••  •  ’Y2n^  where  Y11  *  "•*  Yini  ar€ 

i.i.d.,  Y^  -  N(u..,V)  under  H.. ,  i  =  1,2.  In  this  case  we  use 

-  1  "v1 

uin  n.  Yii’  1-1 *2, 


*1  j=l  1J 


2  ni 


to  estimate  ,  u2  and  V.  Also  we  use  pin  =  n^n  to  estimate  ,  i  =  1,2, 

where  we  assume  that  n^  -  B(n,p^),  n^  +  n?  =  n,  0  <  <  1. 

THEOREM  3.  Under  the  conditions  above,  B(Tn(Y(n))’  ^(Y^))  converges 
to  B(T*,5*)  in  exponential  rate,  i.e.,  for  any  e  >  0,  there  exists  a  constant 
C  >  0  depending  upon  e  but  not  upon  n,  such  that 

P(|B(VY(n)5)  -  I  it)  =  OCe'0").  07) 

Proof.  The  proof  runs  largely  along  the  line  as  in  Theorem  1,  with  the 
help  of  the  following  known  result  (see  Petrov  (1975)). 

LEMMA  2.  Let  X-j ,  X2>  ...  be  an  i.i.d.  sequence  of  random  variables, 

EX^  =  0,  and  there  exists  6  >  0  such  that 

tx. 

E(e  )<<*>,  for  1 1 1  <  6. 

Then  for  any  £  >  0  there  exists  a  constant  C  depending  upon  e  but  not  upon  n. 


P(|XJ  i  e)  =  0(e'Cn), 


such  that 
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1  n 

where  Xn  =  —  7  X . . 
n  n  v  i 

Turning  to  the  proof  of  the  theorem,  we  note  that  the  random  variables 
2  2 

-  N(0,cr  ),  and  "  P-j  defined  by 

P<e2=l)  =  1  -  p(c2  =  o)  =  pr 

all  satisfy  the  condition  of  Lenina  2.  From  this  it  is  easily  seen  that  for 
any  given  n  >  0  we  have 


p(lPin-Pjl  1  n)  *  0(e'Cn)»  1*1.2  (18) 

P(  ||pin  -  Pi  ||  1  n)  -  0(e“Cn),  i  =  1,2  (19) 

P(  || Vn  -  V||  >  n)  =  0(e'Cn).  (20) 

Now  given  arbitrarily  e  >  0,  according  to  Lemma  1,  there  exists  n  >  0  such 
that 

(|fi„(V(n)).p1|  <  n,  l|C1n<Y(n))-v,l|  <  n,  1  -  1,2;  l|VntY(n))  -  VII  <  n) 
lB(Tn<Y{n)).  «„(*<„)))  -  B(T*,«*)|  <  c. 

From  this  and  ( 1 8) - ( 20 ) ,  we  get 

P(KT„<¥(„)>-  5n<Y<„)>)  -  B(T*,i*)|  ^  .) 

1  I  p(  |pin  -  pi  I  in)  +  .lP(||uin-«j||  in)  +  P(  II  Vn  -  v  II  in) 

=  0(e"Cn), 

and  the  proof  is  concluded. 
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